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Abstract. A weighted finite-state machine with n tapes (n-WFSM) de- 
fines a rational relation on n strings. The paper recalls important oper- 
ations on these relations, and an algorithm for their auto-intersection. 
Through a series of practical applications, it investigates the augmented 
descriptive power of n-WFSMs, w.r.t. classical 1- and 2-WFSMs (accep- 
tors and transducers). Some applications are not feasible with the latter. 



1 Introduction 

A weighted finite-state machine with n tapes (n-WFSM) [33I7I14I10I12] defines 
a rational relation on n strings. It is a generalization of weighted acceptors (one 
tape) and transducers (two tapes). 

This paper investigates the potential of n-ary rational relations (resp. n- 
WFSMs) compared to languages and binary relations (resp. acceptors and trans- 
ducers), in practical tasks. All described operations and applications have been 
implemented with Xerox's WFSC tool [17]. 

The paper is organized as follows: Section [2] recalls some basic definitions 
about n-ary weighted rational relations and n-WFSMs. Section [3] summarizes 
some central operations on these relations and machines, such as join and auto- 
intersection. Unfortunately, due to Post's Correspondence Problem, there can- 
not exist a fully general auto- intersection algorithm. Section 0] recalls a restricted 
algorithm for a class of n-WFSMs. Section demonstrates the augmented de- 
scriptive power of n-WFSMs through a series of practical applications, namely 
the morphological analysis of Semitic languages (|5.1I) . the preservation of in- 
termediate results in transducer cascades (|5.2p . the induction of morphological 
rules from corpora (|5.3[) . the alignment of lexicon entries (|5.4j) . the automatic 
extraction of acronyms and their meaning from corpora (|5.5I) , and the search for 
cognates in a bilingual lexicon 



* Sections [2HH are based on published results !18 19 20 4 , obtained at Xerox Research 
Centre Europe (XRCE), Meylan, France, through joint work between Jean-Marc 
Champarnaud (Rouen Univ.), Jason Eisner (Johns Hopkins Univ.), Franck Guingne 
and Florent Nicart (XRCE and Rouen Univ.), and the author. 
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2 Definitions 

We recall some definitions about n-ary weighted relations and their machines, fol- 
lowing the usual definitions for multi-tape automata [7|6j , with semiring weights 
added just as for acceptors and transducers |24l27j . For more details see [15] . 

A weighted n-ary relation is a function from (S*) n to K, for a given finite 
alphabet S and a given weight semiring /C = {K, ©, ®, 0, 1). A relation assigns 
a weight to any n-tuple of strings. A weight of can be interpreted as meaning 
that the tuple is not in the relation. We are especially interested in rational (or 
regular) n-ary relations, i.e. relations that can be encoded by n-tape weighted 
finite-state machines, that we now define. 

We adopt the convention that variable names referring to n-tuples of strings 
include a superscript ^ . Thus we write rather than s for a tuple of strings 
(si, . . . s n ). We also use this convention for the names of objects that contain 
n-tuples of strings, such as n-tape machines and their transitions and paths. 

An n-tape weighted finite-state machine (n-WFSM) A 1 ^ is defined by a six- 
tuple A^ n > = (S, Q, fC, E^ n \ A, g), with S being a finite alphabet, Q a finite set 
of states, JC=(K, ©, <g>, 0, 1) the semiring of weights, C (Q x (S*) n xKxQ) 
a finite set of weighted n-tape transitions, A : Q — > K a function that assigns 
initial weights to states, and g:Q->Ka function that assigns final weights to 
states. 

Any transition eW G E^ has the form e^ = (y,£^ n \w,t). We refer to 
these four components as the transition's source state y(e("') € Q, its label 
£(eW)e (S*) n , its weight «j(eW)6l, and its target state t(e("')eQ. We refer 
by E(q) to the set of out-going transitions of a state q&Q (with E{q)QE^ n >). 

A path 7^ of length k > is a sequence of transitions e^e^ • • • such 
that t(ej n ^) = y(ej?\) for all ie[l,fc — 1]. The label of a path is the element-wise 
concatenation of the labels of its transitions. The weight of a path y( n ' is 

«,( 7 W) = de£ A(y(e<" ) )) ® I (g) «, (e^) I ® e(t(ei n) )) (1) 

\je[i,k] J 

The path is said to be successful, and to accept its label, if w{y^) ^ 0. 

3 Operations 

We now recall some central operations on n-ary weighted relations and n-WFSMs 
|21j . The auto- intersection operation was introduced, with the aim of simplifying 
the computation of the join operation. The notation is inspired by relational 
databases. For mathematical details of simple operations see [T8] . 

3.1 Simple Operations 

Any n-ary weighted rational relation can be constructed by combining the basic 
rational operations of union, concatenation and closure. Rational operations can 
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be implemented by simple constructions on the corresponding non-deterministic 
n-tape WFSMs [34]. These n-tape constructions and their semiring- weighted 
versions are exactly the same as for acceptors and transducers, since they are 
indifferent to the n-tuple transition labels. 

The projection operator 7'"(j 1 ,...j m ) , with ji, . . . j m £ [l,n], maps an rt-ary re- 
lation to an m-ary one by retaining in each tuple components specified by the 
indices ji, . . . j rn and placing them in the specified order. Indices may occur in 
any order, possibly with repeats. Thus the tapes can be permuted or duplicated: 
7172,1) inverts a 2-ary relation. The complementary projection operator T?{j lt ...j m } 
removes the tapes ji, . . .j m and preserves the order of other tapes. 



3.2 Join operation 

The n-WFSM join operator differs from database join in that database columns 
are named, whereas our tapes are numbered. Since tapes must explicitly be 
selected by number, join is neither associative nor commutative. 

For any distinct i\, . . . i r € [1, n] and any distinct ji, . . . j r G [1, to], we define 
a join operator Mr^—^^ ^—jx. It combines an n-ary and an w-ary relation into 
an (n + m — r)-ary relation defined as follows^] 

^i,..^)^) ((«!.- ••«n.»l.-"«m-r)) =dcf Tlf\u^) ® ^ ) (» W ) (2) 

w (m) being the unique tuple s. t. 7r{j 1 ,... > }(^ (m) ) = s (m-r) and (Vfc £ [l,r]) v jk = u lk . 
Important special cases of join are crossproduct IZ^xTZ^ = TZ^ 1X1 7?. 2 m \ 

intersection IZ^ DTZ^ — IZ^ ^{i=i.... n =n} 7t^\ and transducer composition 
7^ 2) o74 2) =7f {2} (7^ 2) N {2=1} IZ^).' 

Unfortunately, rational relations are not closed under arbitrary joins [18J . 
Since the join operation is very useful in practical applications (Sec. [5|, it is 
helpful to have even a partial algorithm: hence our motivation for studying auto- 
intersection. 



3.3 Auto-Intersection 

For any distinct «i, ji, . . . i r ,jr € [1, n], we define an auto-intersection operator 
C{i 1 =j 1 .i 2 =j2,...ir=jr} ■ ^ ma P s a relation TZ^ to a subset of that relation, pre- 
serving tuples whose elements are equal in pairs as specified, but removing 
other tuples from the support of the relation!! The formal definition is: 

(v {h=h ,„, r=jr} (M »))«•!, -•.» -def o otherwise (3) 



For example the tuples (abc,def,e) and (def,ghi,e,jkl) combine in the join 
M{ 2= i,3 = 3} and yield the tuple {abc, def, e, ghi,jkl), with a weight equal to the prod- 
uct of their weights. 

2 The requirement that the 2r indices be distinct mirrors the similar requirement on 
join and is needed in ([5]). But it can be evaded by duplicating tapes: the illegal 
operation <T{i= 2 ,2=3} (TZ) can be computed as 7f{3}(<7{i=2,8=4}(7r<i,2,2,3>(72.))). 
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It is easy to check that auto-intersecting a relation is different from joining 
the relation with its own projections. Actually, join and auto-intersection are 
related by the following equalities: 



ll^ M {il=ju .. Ar=jr} ?4 m) = 7r {n+jlr .. n+jr} ( (J [n=n+n .... lr=n+Jr} { K^xK^ ) ) (4) 

(5) 



/ \ 



C {tl=n ,.., r=h} (K {n) ) = ^ (n) «{ il =l,i 1 = 2l ...i r =2r-l,i r = 2 r} 

\ r times / 

Thus, for any class of difficult join instances whose results are non-rational 
or have undecidable properties |18j . there is a corresponding class of difficult 
auto-intersection instances, and vice-versa. Conversely, a partial solution to one 
problem would yield a partial solution to the other. 

An auto-intersection on a single pair of tapes is said to be a single-pair 
one. An auto-intersection on multiple pairs of tapes can be defined in terms of 
multiple single-pair auto-intersections: 

<T {il ^ lr ..i r = jr }(Tl (n) ) =def ^= >} (-<T {ll = 3l }(^ W )-) (6) 



4 Compilation of Auto-Intersection 

We now briefly recall a single-pair auto-intersection algorithm and the class of 
bounded delay auto-intersections that this algorithm can handle. For a detailed 
exposure see [15] . 



4.1 Post's Correspondence Problem 

Unfortunately, auto-intersection (and hence join) can be reduced to Post's Cor- 
respondence Problem (PCP) [31]. Actually, any PCP instance can be represented 
as an unweighted 2-FSM, and the set of all solutions to the instance equals the 
auto- intersection of the 2-FSM [18]. 

Since it can generally not be decided whether any solution exists to an arbi- 
trary PCP instance, it is also undecidable whether the result of auto-intersection 
is empty. Therefore, no partial auto-intersection algorithm can be "complete" 
in the sense that it always returns a correct n-FSM if it is rational, and always 
terminates with an error code otherwise. Such an algorithm would make PCP 
generally decidable since a returned n-FSM can always be tested for emptiness, 
and an error code indicates non-rationality and hence non-emptiness. 



4.2 A class of rational auto-intersections 

Although there cannot exist a fully general algorithm, — <J{i=j} {A± ) can be 
compiled for a class of triples (A^\ i, j) whose definition is based on the notion 
of delay |8|26j . The delay S^j) (s^) is the difference of length of the strings Sj 
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and Sj of the tuple : Suj\(s^) — \si\ — \sj\ (i,j£[l,n\). We call the delay 
bounded if its absolute value does not exceed some limit. The delay of a path 
results from its labels on tapes i and j: % 3 >(7 (n) ) = I W7 (n) ))i| - |(% (n) ))j I- 
A path has bounded delay if all its prefixes have bounded delayjf] and an n- 
WFSM has bounded delay if all its successful paths have bounded delay. 

As earlier reported [TO] , if an n-WFSM does not contain a path travers- 
ing both a cycle with positive and a cycle with negative delay w.r.t. tapes i 
and j0 then the delay of all paths of its auto-intersection A^> = (^1 ) is 

bounded by some S x , , and this bound can be compiled from A^ . 



4.3 An auto-intersection algorithm 

Our algorithm for the above mentioned class of rational auto-intersections pro- 
ceeds in three steps [19120] : 

1. Test whether the triple (A^ fulfills the above conditions. 

If not, then the algorithm exits with an error code. 

2. Calculation of the bound 8™ 3 *j for the delay of the auto-intersection 

A^=a {l=3} (A^). 

3. Construction of the auto-intersection within the bound. 



a:£:x/n' 



(a) 



00 



(3) 





e:a:y/w 



(o,(e,3) (o,(a,0) (0,(aa.$) 
, a:e:x/w /~\ a:e:xAv 
TO Kll> 12 



1 (3) 



o-{i=2}(-4f) e:a:y/w! e:a:y/M', 

(1,(e,4) (l3) (1,(6^)©^ 
Fig. 1. (a) A 3-WFSM and (b) its auto-intersection 



Figure [T] illustrates step 3 of the algorithm: State 0, the initial state of A\ , 
is copied as initial state 10 to A^ . Its annotation, (0, (e,e)), indicates that it is 
a copy of state and has leftover strings (e,e). Then, all out-going transitions 
of state and their target states are copied to A^ 3 \ as states 11 and 13. A 
transitions is copied with its original label and weight. The annotation of state 11 

3 Any finite path has bounded delay (since its label is of finite length). An infinite 
path (traversing cycles) may have bounded or unbounded delay. For example, the 
delay of a path labeled with ({ab, e) (e, xz)) h is bounded by 2 for any h, whereas that 
of a path labeled with (ab, e) h {e, xz) h is unbounded for h — s> 00. 

4 Note that the n-WFSM may have cycles of both types, but not on the same path. 
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indicates that it is a copy of state and has leftover strings (a, e). These leftover 
strings result from concatenating the leftover strings of state 10, (e,e), with the 
relevant components, (a,e), of the transition label a:e:x. For each newly created 
state q £ Qa, we access the corresponding state q\ £ Qai, and copy qi's out- 
going transitions with their target states to A^ 3 \ until all states of have 
been processed. 

State 12 is not created because the delay of its leftover strings (aa, e) exceeds 
the pre-calculated bound of S™i2) = ^he longest common prefix of the two 
leftover strings of a state is removed. Hence state 14 has leftover strings (e,e) 
instead of (a,e)(e,a) = (a, a). A final state is copied with its original weight if 
it has leftover strings (e, e), and with weight otherwise. Therefore, state 14 is 
final and state 13 is not. 

The construction is proven to be correct and to terminate |19|20j . It can be 
performed simultaneously on multiple pairs of tapes. 



5 Applications 

This section focuses on demonstrating the augmented descriptive power n-WFSMs, 
w.r.t. to 1- and 2-WFSMs (acceptors and transducers), and on exposing the 
practical importance of the join operation. It also aims at illustrating how to use 
n-WFSMs, in practice. Indeed, some of the applications are not feasible with 1- 
and 2-WFSMs. The section does not focus on the presented applications per se. 



5.1 Morphological Analysis of Semitic Languages 

n-WFSMs have been used in the morphological analysis of Semitic languages 
|14 |22 | 23| e.g.]. 

Tabled] by Kiraz [35] shows the "synchronization" of the quadruple = 
(aa, ktb, waCVCVC, wakatab) in a 4-WFSM representing an Arabic morpholog- 
ical lexicon. Its first tape encodes a word's vowels, its second the consonants 
(representing the root), its third the affixes and the templatic pattern (defining 
how to combine consonants and vowels), and its fourth the word's surface form. 

Any of the tapes can be used for input or output. For example, for a given 
root and vowel sequence, we can obtain all existing surface forms and templates. 
For a given root and template, we can obtain all existing vowel sequences and 
surface forms, etc. 



vocalism Table ± Multi-tape-based mor- 

phological anaysis of Arabic; 



w a C V C V C pattern and affixes 

1 t r- c c table adapted horn Kiraz 22 

w a k a t a b surface form * 1 — • 
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5.2 Intermediate Results in Transduction Cascades 

Transduction cascades have been extensively used in language and speech pro- 
cessing [11291251 e.g.]. 

In a classical weighted transduction cascade (Figure [2]), consisting of trans- 
ducers . . . T r , a weighted input language consisting of one or more 

(2) 

words, is composed with the hrst transducer, , on its input tape. The output 
projection of this composition is the first intermediate result, Lj 1 . It is further 

(2) 

composed with the second transducer, T 2 , which leads to the second interme- 
diate result, etc.. Generally, if 5 = 7r <2) {Lf\ orf } ) (i e [1, r]). The output 
projection of the last transducer is the final result, Lf . 

l? Ti l ( ; 

u tape 2 - u 



Fig. 2. Classical 2-WFSM transduction cascade 



„<2) 
1 2 

tape 1 
tape 2 



r 111 



0- 



T (2) 
r 

tape 1 
tape 2 



(l) 



L. 



At any point in the cascade, previous intermediate results cannot be accessed. 
This holds also if the cascade is composed into a single transducer: T' 2 ' = 
o • • • o Tr 2%> ■ None of the "incorporated" sub-relations of can refer to a 
sub-relation other than its immediate predecessor. 

In multi-tape transduction cascade, consisting of n-WFSMs A^ 11 ^. . . Ar\ 
any intermediate results can be preserved and used by subsequent transductions. 
Figure [3] shows an example where two previous results are preserved at each 

(2) 

point, i.e., each intermediate result, L\ , has two tapes. The projection of the 
output tape of the last n-WFSM is the final result, : 

L< 2 > = L« M {1=1} A? (7) 

L? ] = 7r <2 ,3) ( N {1=1j2=2} A<? ) (i e [2, r 1]) (8) 

LW= 7 r (3) (4 2) i "{1=1,2=2} 4 3) ) (9) 

This augmented descriptive power is also available if the whole cascade is 
joined into a sing le 2-WFSM, A^, although A^ has only two tapes (in this 
example), for input and output, respectively. A^ can be iteratively constructed 
(Any B\ m) is the join of Af ] to A\ 3) ) : 

B{ 2) = A^ (10) 
Bf ] = 7 r (1)n _ lin) (B!r i ) M { „_ 1=li „ =2} Af ] ) (i€[2,r], me {2, 3}) (11) 
AW =7r ( Bp ) (12) 
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Each (except the first) of the "incorporated" multi-tape sub-relations in 
will still refer to its two predecessors. 



L? 


A? 


L? 


A? 








tape 1 




, tape 1 -~ 


tape 2 


tape 2 Y 


ATI 


tape 3 



A tape 1 



r°L- H tape 2 n 

I— I tape 3 k^J 



Fig. 3. n-WFSM transduction cascade 



5.3 Induction of Morphological Rules 

Induction of morphemes and morphological rules from corpora, both supervised 
and unsupervised, is a subheld of NLP on its own |3I9I51 e.g.]. We do not propose 
a new method for inducing rules, but rather demonstrate how known steps can 
be conveniently performed in the framework of n-ary relations. 

Learning morphological rules from a raw corpus can include, among others: 
(1) generating the least costly rule for a given word pair, that rewrites one word 
to the other, (2) identifying the set of pairs over all corpus words where a given 
rule applies, and (3) rewriting a given word by means of one or several rules. 



Construction of a rule generator For any word pair, such as (par ler, parlous) 
(French, [to] speak, [we] speak), the generator shall provide a rule, such as 
".er:ons", suitable for rewriting the first to the second word at minimal cost. 
In a rule, a dot shall mean that one or more letters remain unmodified, and an 
x:y-part that substring x is replaced by substring y. 
We begin with a 4-WFSM that defines rewrite operations: 

A[ i} = («?, ?, . , K} {1=2} ,0> U ((?,£, ?, D) {1=3} ,0) U « £ , ?, ?, I> {2=3}) 0) U « £ , e, : ,S},0»* (13) 

where ? can be instantiated by any symbol, e is the empty string, u—j\ a 
constraint requiring the ?'s on tapes i and j to be instantiated by the same 
symbol [25] @ and a weight over the tropical semiring. 

Figure H] shows the graph of A ^ and Figure [5] (rows 1-4) the purpose of its 
tapes: Tapes 1 and 2 accept any word pair, tape 3 generates a preliminary form 
of the rule, and tape 4 generates a sequence of preliminary operation codes. The 
following four cases can occur when A^ reads a word pair (cf. Eq. [Tc 



5 Deviating from [28], we denote symbol constraints similarly to join and auto- 
intersection constraints. 
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(?,?,.,K) JO (?,e,?,D) /O 




(e,£,:,S) 10 



(e,?,?,I), /O 



Fig. 4. Initial form A\ 
of the rule generator 



(4) 



word 1 
word 2 
•preliminary rule 
'preliminary op. codes 


swum 


s w i m 


. . u m : i m 


K K D D S 1 1 


final rule 
final operation codes 
weights 


u m : i m 


K k D d S 1 i 


10 4 2 4 2 



Fig. 5. Mapping from the word 
pair (swum, swim) to various se- 
quences 



1. (?, ?, . , K){ 1=2 } : two identical letters are accepted, meaning a letter is kept 
from word 1 to word 2, which is represented by a "." in the rule and K (keep) 
in the operation codes, 

2. (?, e, ?, D){i = 3}: a letter is deleted from word 1 to 2, expressed by this letter 
in the rule and D (delete) in the operation codes, 

3. (e, ?, ?, I) {2=3} : a letter is inserted from word 1 to 2, expressed by this letter 
in the rule and I (insert) in the operation codes 

4. (e, e, : , S): no letter is matched in either word, a ":" is inserted in the rule, 
and a S (separator) in the operation codes. 

Next, we compile that constrains the order of operation codes. For ex- 
ample, D must be followed by S, I must be preceded by S, etc. The constraints 
are enforced through join (Fig. [5]row 4) : A 2 4) = N{- 4=1 } C (1) . 

(2) 

Then, we create B\ that maps temporary rules to their final form by re- 

(2) 

placing a sequence of dots (longest match) by a single dot. We join B{ with 
the previous result (Fig. rows 3, 5) : Af ] = A { 2 4) N {3=1} b[ 2) . 

(2) 

Next, we compile B\ that creates more fine-grained operation codes. In a 
sequence of equal capital letters, it replaces each but the first one with its small 
form. For example, DDD becomes Ddd. B\ is joined with the previous result 
(Fig. [5] rows 4, 6) : Af ] = A^ M {4=1} B< 2) . 

B± , and B^ can be compiled as unweighted automata with a tool 
such as Xfst |13I2) and then be enhanced with neutral weights. 

Finally, we assigns weights to the fine-grained operation codes by joining 
= ((K, 1) U (k, 0) U (D, 4) U (d, 2) U (1, 4) U (i, 2) U (S, 0))* with the previous 
result (Fig.[S]rows 6, 7) : Af ] = Af } N {6=1} B { 3 1] . 

We keep only the tapes of the word pair and of the final rule in the generator 
(Fig. [5] rows 1, 2, 5). All other tapes are of no further use: 



= n {1 , 2>5) (4 6) ) 



(14) 
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The rule generator G*- 3 ' maps any word pair to a finite number of rewrite 
rules with different weight, expressing the cost of edit operations. The optimal 
rule (with minimal weight) can be found through n-tape best-path search |16j . 

Using rewrite rules We suppose that the rules generated from random word 
pairs undergo some statistical selection process that aims at retaining only mean- 
ingful rules. 

To facilitate the following operations, a rule's representation can be 
changed from a string, such as s*- 1 -* =".er:ons", to a 2-WFSM r' 2 ' encoding the 
same relation. This is done by joining the rule with the generator: 
r( 2 ) = 71-^2) (G ( - 3 ' ) ^{3=1} s*- 1 -*). An resulting from ".er:ons", accepts (on 
tape 1) only words ending in "er" and changes (on tape 2) their suffix to "ons". 

Similarly, a 2-WFSM i?' 2 ' that encodes all selected rules can be generated 
by joining the set of all rules (represented as strings) S' l > with the generator: 

RW = 7T (1 , 2) ( N {3=1} <?«). 

To find all pairs P^ of words from a corpus where a particular rule applies, 
we compile the automaton of all corpus words, and compose it on both 

tapes of r' 2 ) : — o r' 2 ' o Similarly, identifying all word pairs 

( 2) 

P n ' over the whole corpus where any of the rules applies (i.e., the set of "valid" 
pairs) can be obtained through: P'' 2 ' = o o WW 

Rewriting a word with a single rule is done by w^p — tt^) (w^ o ) 
and — ■n^lr^ o w^). Similarly, rewriting a word with all selected 
rules is done by = n {2) (wP o R^) and w[ 1] = tt {1) {R {2) o w { 2 1] ). 

5.4 String Alignment for Lexicon Construction 

Suppose, we want to create a (non- weighted) transducer, D^ 2 \ from a list of 
word pairs of the form [inflected form, lemma), e.g., (swum, swim), such that 
each path of the transducer is labeled with one of the pairs. We want to use only 
transition labels of the form (er, a), (a, e), or (e, a) (Ver £ S), while keeping paths 
as short as possible. For example, (swum, swim) should be encoded either by the 
sequence (s, s)(w, w)(u, e)(e, i)(m, m) or by (s, s)(w, w)(e, i)(u, e)(m, m), rather than 
by the ill-formed (s, s)(w, w)(u, i)(m, m), or the sub-optimal (s, e)(w, e)(u, e)(m, e) 
(e,s)(e,w)(e, i)(e,m). 

We start with a 5-WFSM over the real tropical semiring [11] : 

4 5) = ( ((?,?, ?,?,K) {1=2=3=4} ,0) U ((e,?, @,?,I) {2=4 },1> U ((?,e, ?,@,D) {1=3} ,1) )* (15) 

where @ is a special symbol representing e in an alignment, {1=2=3=4} a con- 
straint requiring the ?'s on tapes 1 to 4 to be instantiated by the same sym- 
bol [28], and and 1 are weights. 

Figure |H] shows the graph of Ap and Figure [7] (rows 1-5) the purpose of 
its tapes: Input word pairs s' 2 ' = (si, S2) will be matched on tape 1 and 2, and 
aligned output word pairs generated from tape 3 and 4. A symbol pair (?, ?) 
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(?,e,?,@,D) n=g) /l 



(?,?,?,?,K) (1=2=3=4) /O 




(e,?,@,?,I) /l 



Fig. 6. Initial form of 
a word pair aligner 



input word 1 


s 


w 


u 


m 


input word 2 


s 


w 




i m 


output word 1 


s 


w 


u 


@ m 


output word 2 


s 


w 


@ 


i m 


operation codes 


K 


K 


D 


1 K 


weights 








1 


1 



Fig. 7. Alignment 
of the word pair 

(swum, swim) 



read on tape 1 and 2 is identically mapped to (?, ?) on tape 3 and 4, a (e, ?} 
is mapped to (@, ?), and a (?,e) to (?,@). A^ 5 ' will introduce @'s in Si (resp. 
in s 2 ) at positions where D^ 2 ^ shall have (e,a)- (resp. a (cr, e)-) transitions^ 
Tape 5 generates a sequence of operation codes: K (keep), D (delete), I (insert). 
For example, will map (swum, swim), among others, to (swuQm, sw@im) with 
KKDIK and to (swium, swi@m) with KKIDK. 

To remove redundant (duplicated) alignments, we prohibit an insertion to 
be immediately followed by a deletion, via the constraint: C (1) = (K U I U D)* - 
(?* I D ?*). The constraint is imposed through join and the operations tape is 
removed: 

Aligner^ = tt {5} ( A ( f ] M {5=1} C« ) (16) 

The Aligner^ will map (swum, swim) among other still to (swu@m, sw@im) 
but no to (swium, swi@m). The best alignment (with minimal weight) can be 
found through n-tape best-path search [15] . 



5.5 Acronym and Meaning Extraction 

The automatic extraction of acronyms and their meaning from corpora is an 
important sub-task of text mining, and received much attention 37 3 2T3"51 e.g.]. 

It can be seen as a special case of string alignment between a text chunk 
and an acronym. For example, the chunk "they have many hidden Markov models" 
can be aligned with the acronym "HMMs" in different ways, such as "they have 
many hidden Markov models" or "they have many hidden Ma r kov models". Alternative 
alignments have different cost, and ideally the least costly one should give the 
correct meaning. 

An alignment-based approach can be implemented by means of a 3-WFSM 
that reads a text chunk on tape 1 and an acronym on tape 2, and generates all 
possible alignments on tape 3, inserting dots to mark letters used in the acronym. 
For the above example this would give "they have many .hidden .Markov . model. s", 
among others. 

6 Later, we simply replace in all @ by e. 
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The 3-WFSM can be generated from n-ary regular expressions that define the 
task in as much detail as required (cf. Sec. l5.3l and l5.4j) . For a detailed description 
see [15] , The best alignment, i.e., the most likely meaning of an acronym is found 
through n-tape best-path search [IB"] . 

The advantage of aligning via a n-WFSM rather than a classical alignment 
matrix [36 30 is that the n-WFSM can be built from regular expressions that 
define very subtle criteria, such as disallowing certain alignments or favoring 
others based on weights that depend on long-distance context. 

5.6 Cognate Search 

Extracting cognates with equal meaning from an English- German dictionary 
EG' 3 -' that encodes triples (English word, German word, part of speech), means 
to identify all paths of EG^ that have similar strings on tapes 1 and 2. 

We create a similarity automaton that describes through weights the 
degree of similarity between English and German words. This can either be 
expressed through edit distance (cf. Sec. 15.31 15.41 and !5.5p or through weighted 
synchronic grapheme correspondences (e.g.: d-t, ght-cht, th-d, th-ss, . . .) : S^ 2 ' = 
(«?,?){i=2},™o) U ((d, t),wi) U ((ght, cht),w 2 ) U ...)* 

When recognizing an English- German word pair, accepts either any two 
equal symbols in the two words (via (?, ?}{i=2}) or some English sequence and 
its German correspondence (e.g. ght and cht) with some weight. 

The set of cognates EG^ 3 g is obtained by joining the dictionary with the 
similarity automaton: EG^ g = EG (3) M {1=li2=2 } S*( 2 ) 

EG^ 3 g contains all (and only) the cognates with equal meaning in EG' 3 -' such 
as (daughter, tochter, noun), (eight, acht, num), or (light, leicht, adj). Weighs of 
triples express similarity of words. 

Note that this result cannot be achieved through ordinary transducer com- 
position. For example, composing with the English and the German words 
separately: 7T(x) (EG^ 3 -*) o o 7T(2) (EG^ 3 **), also yields false cognates such as 
(become, bekommen) ([to] obtain). 

6 Conclusion 

The paper recalled basic definitions about n-ary weighted relations and their n- 
WFSMs, central operations on these relations and machines, and an algorithm 
for the important auto-intersection operation. 

It investigated the potential of n-WFSMs, w.r.t. classical 1- and 2-WFSMs 
(acceptors and transducers), in practical tasks. Through a series of applications, 
it exposed their augmented descriptive power and the importance of the join 
operation. Some of the applications are not feasible with 1- or 2-WFSMs. 

In the morphological analysis of Semitic languages, n-WFSMs have been 
used to synchronize the vowels, consonants, and templatic pattern into a surface 
form. In transduction cascades consisting of n-WFSMs, intermediate result can 
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be preserved and used by subsequent transductions. n-WFSMs permit not only 
to map strings to strings or string m-tuples to fc-tuples, but m-ary to fc-ary 
string relations, such as an non-aligned word pair to its aligned form, or to a 
rewrite rule suitable for mapping one word to the other. In string alignment 
tasks, an n-WFSM provides better control over the alignment process than a 
classical alignment matrix, since it can be compiled from regular expressions 
defining very subtle criteria, such as long-distance dependencies for weights. 
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